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ABSTRACT 

As observations of the Epoch of Reionization (EoR) m redshifted 21cm emission begin, we asses the 
accuracy of the early catalog results from the Precision Array for Probing the Epoch of Reionization 
(PAPER) and the Murchison Widefield Array. The MWA EoR approach derives much of its sensitivity 
from subtracting foregrounds to < 1% precision while the PAPER approach relies on the stability and 
symmetry of the primary beam. Both require an accurate flux calibration to set the amplitude of 
the measured power spectrum. The two instruments arc very similar in resolution, sensitivity, sky 
coverage and spectral range and have produced catalogs from nearly contemporaneous data. We use 
a Bayesian MCMC fitting method to estimate that the two instruments are on the same flux scale to 
within 20% and find that the images are mostly in good agreement. We then investigate the source of 
the errors by comparing two overlapping MWA facets where we find that the differences are primarily 
related to an inaccurate model of the primary beam but also correlated errors in bright sources due 
to CLEAN. We conclude with suggestions for mitigating and better characterizing these effects. 
Subject headings: extra-galactic — catalogs — instrumentation: radio 



1. INTRODUCTION 



Recent interest in very high redshift (6 < z < 12) 21 
cm HI emi ssion from the Epoch of Reionization (EoR, see 
reviews in iZaldarriaga et al.ll2004l: iMcOuinn et al.l l2006l: 



2006 


Wane et al. 


120061: 1.Tehc et all 12008 


; iBowman et al. 


2009 


Datta et al. 


120101: IChauman et al.l 


20121 ICho et al. 


2012 


or just by avoiding the contaminating modes en- 



iFurlanetto et al.l 120061 : iMorales fc Wvithell2010[ ) has in- 
spired a renaissance of meter wavelength (y < 200 MHz) 
radio astronomy Several telesco pes, including the Giant 
Metre- Wave Telescope (G MRT;[S^Sup||l99liH, th e Low 
Frequency Array (LOFAR; iRotteering et al.ll2006lfl. the 
Murchison Wide-fiel d Arrav fMWA: iTingav eraLfl2012l: 
IBowman et al.|[2012f FI. and the Precision Array for Prob- 
ing t he Epoch of Reionization (PAPER; iParsons et al.l 
[20T0in are beginning to characterize foregrounds and per- 
form their first deep int egrations and set upper limits 
(jPaciga et al.l[20Tlll20Tl . Both PAPER and the MWA 
operate in the southern hemisphere, as will the future 
Square Kilometer Arrays (SKA). 

The EoR signal will be a small spatial and spec- 
tral variation on top of bright fo r eground sources 
(IMatteo et al.l [20041: lOh fc Macld[20M I.Tehc et al.llW)! 
IBowman et al.l2006l ). The separation of the EoR from 



Simulations tackling pa- 
rameter estimation, polarization and foreground subtrac- 
tion all assume that all unresolved sources will be re- 
moved such that the errors are indistinguishable from the 
unresolved point source background both spatially and 
spectrally (Liu fc Tegmarld 1 2011: Bowm an et al.l 
Liu et all [2009: Bowman et al. 2009: HarkeTet al.l 
Gleser et al.l 120081 IPetrovic fc OhI 120 llh . The 
of source residual varies between these 



2009; 



2010; 



these foregrounds is expected to be the dominant source 
of uncertainty and has been the focus of much study. 
Though the spatial RMS of the unresolved background 
was initially calculated to be larger than the EoR sig- 
nal (jMatteo et al.l[200^ , later simulations found that the 
spectral smoothness of the unresolved background en- 
abled accurate su btraction to acceptable levels in the k 
modes of interest ([Morales fc Hewittir2004t IMorales et al.l 
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level 
simulations. 

While lBo^an et al.l (|200l assume that subtraction will 
achieve a 10 mJy residual. iLiu et all (|2009[ ) test a range 
of scenarios up to 100 mJy residual fiuxj^ Since even the 
quietest fields of view contain several 40 Jy sources, these 
residual levels translate to removal precision require- 
ments of 0.025% and 0.25% respectively! In contrast, 
most radio point source catalogs have flux accuracies in 
the 5 to 20% range. Studies of errors in bright source 
removal are limited. In one simulat i on th at included 
bright source subtraction, iDatta et al.l (|2010( ) found that 
point-source foregrounds extended further into the spec- 
tral dimension than were previously predicted into the 
so-called "wedge". This turns out to be equivalent to 
the statement that longer baselines are contaminated at 
higher delays which defines the Parsons et al "wall" that 
defines the k modes accessible to PAPER. In both cases 
the implication is that the fiux accuracy requirement ex- 
tends to the spectral dimension. 
The requirement of point source subtraction imposes 

* The ultimate flux limit to which sources can be identified and 
removed is set by the resolution of the instrument. Bright extra- 
galac tic point sourc es increase in immbcr with decreasing bright- 
ness llCondon et al.l [T9 98: Lane ct al. 2008; Baldwin et al. 1983; 
IHales et al.l ll98g; McGilchr ist et al.l I1990l 'l. At some source flux 
level, the number of sources per synthesized beam becomes greater 
than one. For PAPER this limit is ~100 mJy, while the MWA 
reaches to tens of mJy. 
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Table 1 

SPECFIND radio continuum source catalogue entr i es and 
estimated uncertainty, adapted from lVoUmer et al.l I I2005I ) 



Survey 


1^0 


e 




Source 


Ref 


Error 


Name 


(MHz) 


(') 


(mJy) 


count 




(%) 


PMN 


4850 


3.5 


20 


50814 


1 


5 


PKS 


2700 


8.0 


50 


8264 


2 


> 3 


FIRST 


1400 


0.083 


1 


811117 


3 


5 


NVSS 


1400 


0.75 


2 


1773484 


4 




SUMSS 


843 


0.75 


8 


134870 


5 


3 


MRC 


408 


3.0 


700 


12141 


6 


7 


TXS 


365 


0.1 


250 


66841 


7 


5 


WISH 


325 


0.9 


10 


90357 


8 


10 


WENSS 


325 


0.9 


18 


229420 


9 


6 


MIYUN 


232 


3.8 


100 


34426 


10 


5 


4C 


178 


11.5 


2000 


4844 


11 


15 


MWA32 


150 


15 


.5 to 10 Jy 


1553 


12 


20 


PAPER32 


150 


15 


10 Jy 


486 


13 


20 



Note. — Refe rences. (1) Wright & Otrupcekj (119901 ): 
I Griffith et al] ||T99D : (2) [D trupcck fc Wright (1991 ]); (3) 
IWhite et alT|ll997'): (4) 'Condon et al. (1998); (5) Mauch et all 
12003); (6) Large ct al. (1981, 1991); (7) Douglas et al. (1996); (8 
rBreuck et a l. (2002); (9) Rcngclink ct al. (1997); (10) Zhang et a] 
a9971): (11) Pilkington & Scott (1965); Gowcr etall (119671) : 
IWilliams et al., t20ia) ; (13) ,Jacobs et al., (,2011,) 



(12) 



accuracy requirements in source modeling which have 
rarely been achieved in prac tice. Accordin g to es timates 
of catalog flux accuracy by IVollmer et al.l ()2005l repro- 
duced in Table [1]), most catalog fluxes at high frequencies 
agree to within ~ 5%. The study included only one cat- 
alog in the EoR band (4C), which had by far the largest 
flux error (15%). Accordingly, attention has begun to 
focus on approaches which largely avoid the need to 
mo del and subtract sou rces to high accuracy (for exam- 
ple, |ParsonsIer^l] [20121). Even in the absence of a need 
for a highly accurate sky model for EoR experiments, 
uncertainty in calibrator flux translates directly into the 
overall amplitude of the power spectrum measurement 
which limits the constraining power of the observation. 
Reliable calibrators are also necessary for modeling the 
instrument primary beam (which enters into the fc-space 
window function and noise estimates) and for generat- 
ing reliable and repeatable instrument calibrations. For 
example, attempts to model the primary beam are cur- 
rently limited by the accuracy to which source fluxes are 
known over a wide enoug h area of sky to fully sample the 
beam (jPober et al.ll2012l) . 

The construction of a catalog necessarily involves the 
compression and omission of information, but in the con- 
text of the above goals, wc can ask three basic questions 
when comparing catalogs: 

1. How were the fltrx scales established for each cata- 
log, and are they consistent with each other? This 
is a question about the average properties of the 
catalog fluxes, and does not imply that any partic- 
ular source has an accurate flux. 

2. Are the random errors in the source fluxes, relative 
to the fundamental flux scale, correctly described 
by the error bars presented? 

3. Are there systematic effects, known or suspected, 
which are not reasonably described by the error 
bars given? 



Answering the first question requires establishing a cer- 
tain source or sources to use as references, and a method 
for comparing to them. Ideally, a detailed model exists 
for the calibration sources, including their spatial and 
spectral structure at the frequencies of interest, as well as 
a model for their variability, if any. A key reference cata- 
log for southern hemisphere low frequency radio sources 
is the fan-b eam survey with the Culgoora Circular Ar- 
ra^PI fCCA: [Sl^[T995t IStee fc Higgin^HOTSl) . The CCA 
produced the so-called "Culgoora" catalog of fluxes at 80 
and 160 MHz. At 160 MHz, the CCA had 1.6' resolu- 
tion and a narrow (1 MHz) bandwidth (jSheridan et alJ 
Il973f ). The C CA catalog's flux sc ale is derived from 
the C KL scale (jConwav et al.l I1963D , as revised in iSled 
()1995D , which is ultimately tied to the flux of Cassiopeia 
A. The Culgoora catalog was compiled from observations 
over the years 1970 - 1984. Its status as the only low- 
frequency radio catalog in the southern hemisphere has 
placed it a the center of the calibration schemes for both 
PAPER and MWA, but it is well to keep in mind that 
is was very different instrument than current EoR tele- 
scopes in terms of bandwidth and resolution, and the 
Culgoora catalog lacks information on the extent and 
spectral index of sources. 

As for the second and third questions, we expect that 
the various kinds of errors which can occur in reported 
fluxes to behave differently according to their origin. Er- 
rors resulting from random noise arc the simplest, and 
are at a value fixed by the local noise level. In a fractional 
sense, these errors are worst for the lowest signal-to-noise 
sources, and indeed, for S/N < 5, reported fluxes from 
blind catalogs tend to be syst ematically biased high due 
to so-called Eddington bias ([Eddingtonl Isl) (unless pre- 
cautions are taken.) Most surveys at low frequencies 
are not dominated by their random errors. For exam- 
ple, the ongoing GMRT 150 MHz surve-yFI reaches an 
RMS noise ^ 8 mJy beam~^, but the flux scale accuracy 
is limited by systematic errors to about 25%. Errors due 
to source fitting, photometry, or CLEANing of a given 
source can all be expected to scale in proportion to the 
source flux, since these methods tend to over- or under- 
estimate by some fraction of the flux, which means these 
produce a fixed fractional error. Sources which are af- 
fected by the improperly convolved sidelobes of another 
source can expect to have discrepancies in their recovered 
flux which are uncorrelated with their fiux level. In ad- 
dition to errors introduced by the data reduction, other 
kinds of systematic discrepancies between measurements 
may be introduced either by the instrument or by nat- 
ural processes. On the instrument side, these effects in- 
clude an incorrect primary beam model, the presence of 
radio frequency interference, or improper bandpass and 
source spectrum calibration. Physical processes include 
ionospheric variability, interstellar medium scintillation, 
and intrinsic source variability. Though most catalogs 
have only a limited model of these kinds of errors folded 
into the listed error bars, systematic effects are often dis- 
cernible. 

In this paper, we compare the recently published 

^ Normally referred to as the Culgoora Radio Heliograph (CRH), 
at nightfall the telescope became the Culgoora Circular Array 

(CCA) 

|http : //tgss ■ ncra . tif r ■ res . in/ 1 
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Table 2 

Observation properties of the catalogs under study 



Telescope PAPER 


MWA 


Resolution [arcmin 


15 


15 


Bandwidth [MHz 


60 


92.16 


Center Frequency [MHz 


145 


154 


Integration Time [minutes 


30 


40-200=' 


Image plane RMS [Jy 


2 


0.2 


Lower flux limit Jy 


10 


0.5 


Catalog method Targets 


Bhnd 


Area covered [sq dcg 


36000 


2600 


Observation dates May & Sept 2010 


March 2010 



Integration time varies between two the two facets, each of 
which is a drift scan which effectively spreads the integration 
time across the image. 

PAPE R (| Jacobs et al.l[20TTh and MWA ()Williains et alJ 
120121 ) catalogs. Because the observations were made by 
these instruments within months of each other, in over- 
lapping portions of the sky, using very similar configura- 
tions and bandwidths, we expect that disagreements be- 
tween sources due to time variation, spectral slope and 
confusion are minimized. Nevertheless, specific instru- 
ment differences including modeling of the primary beam 
and CLEANing remain between the two catalogs, as well 
as differing ficlds-of-view, noise depth, and catalog con- 
struction method. Our goal is to further understand the 
origins of errors in published source fluxes of catalogs 
from EoR surveys. By limiting the data scope to only 
published data, we will characterize the degree to which 
published catalogs provide all the information necessary 
to reconstruct the sky model. This will further the over- 
arching goal of refining our ability to reliably describe 
and exchange sky models for the purposes of calibration 
and consistency checks. 

The outline of the paper is as follows. The PAPER and 
MWA observations are described in Section [2l Section [3] 
compares the PSA32 and MWA32 catalogs in their region 
of overlap and introduces a robust statistical comparison 
method that uses a Markov Chain Monte Carlo (MCMC) 
algorithm to compute the relative fiux scale and its error. 
Section|4]looks for systematic effects in both data sets by 
internal comparison of the MWA data and comparison of 
the PAPER data against the Culgoora catalog. Section 
[5] summarizes the various errors identified, and concludes 
with recommendations for future EoR foreground cata- 
loging and results comparison efforts. 

2. THE MWA32 AND PSA32 DATA SETS 

The MWA in Western Austrafia and PAPER in South 
Africa are both actively observing as their commission- 
ing progresses. As part of the EoR effort the observers 
are generating "global sky models" , a key component of 
which is a point source catalog. First-look catalogs us- 
ing data taken during 2010 are now available from both 
instruments. Relevant data about the two catalogs are 
listed in Table [2l Both instruments operated 32-antcnna 
arrays centered at an observing frequency near 150 MHz 
with similar antenna layouts and bandwidth that result 
in an apparent resolution of ^15'E3 

Sources having more power at lower frequencies can have an 
effective synthesized beam 35' wide compared with brighter at the 
higher end of the bend which would have an effective width of only 
15'. 



The PAPER data set () Jacobs et all 1201 IL hereafter 
PSA32) consisted of observations on two nights sepa- 
rated by 3 months, using 32 single-polarization dipoles 
with 60 MHz of bandwidth centered at 145 MHz. Both 
nights were used to make a mosaic covering the entire 
sky with S < 10°. The brightest two sources were used 
for ph ase calibration and then filtered in delay-delay rate 
space ([Parsons fc Backeill2009[ ) . The visibilities were im- 
aged in ten minute transit " snapshots" and th en mo- 
saiccd into a sing le HEALPix fCorski et all[2005l ) image. 
An image-based CLEAN(Hogboni 1974!) was performed on 
the brightest sources, but most of the image was left un- 
CLEANed. For this reason the depth of the catalog was 
kept to the brightest few sources in the sky. The fluxes 
given in the PSA32 catalog are the peak flux within 30' 
of locations of catalog sources chosen from the Molon- 
glo Reference Catalog (MRC; iLarge et aLlflOSl [l99T). 
In a selection designed to be complete at the minimum 
flux, it includes all sources above 10 Jy as extrapolated 
to 150 MHz using the catalog spectral index. The PSA32 
fluxes compared to MRC and Culgoora showed a similar 
range of variance about unity flux scale as the MRC and 
Culgoora showed between themselves. 

The MWA32 images were made from several nights 
of data in March 2010 from scans of two fields centered 
on RA 9hl8m6s, Dec -12d05m45s (Hydra A) and RA 
10h20m0s, Dec -lOdOmOs (E0R2). Imaging was per- 
formed in three 30 MHz bands which were averaged into 
one 90 MHz wideband image on each field. In this aver- 
age the three maps were weighted by a positive spectral 
index of 0.8 to compensate for the average spectral in- 
dex of -0.8. For sources with a spectral index of -0.8, 
this will increase the perceived flux by 2.5% as well as 
slightly shrink the effective PSF by emphasizing higher 
frequency data. The images include both more integra- 
tion time and more snapshots, than the PSA32 obser- 
vations, and were cleaned to a much deeper level. The 
MWA catalog sources were found blindly in this wide- 
band image, without any catalog prior. Peaks having 
SNR > 3, where the noise level is the average nearby im- 
age RMS; were flt with two dimensional Gaussians. The 
SNR = 3 sources range in RMS from 167 mJy to 3 Jy, 
and 0.5 to 10 Jy in amplitude as the noise varies across 
the map. The catalog lists the Gaussian amplitude of 
all flts that converged, but not the sizes and orientations 
of the Gaussians. The derived fluxes were found to be 
within 30% agreement of the MRC predicted flux, which 
was then given as the data point uncertainty. 

The PAPER flux scale was derived by calibrating each 
epoch to a single Culgoora source, using 1422-297 for the 
May and 0521-365 for the September data. The calibra- 
tion was effectively applied to the entire image by the 
use of a primary beam model. The MWA flux scale was 
derived from an ensemble of sources with fluxes at 80 
and 160 MHz from the CCA, and 408 MH z from MRC, 
so the fluxes used bv lWilliams et al.l (|2012[ ) were not pre- 
cisely those of the CCA 160 MHz catalog, though they 
are of course closely tied to them. The use of Culgoora 
by both instruments to set a flux scale does not of course 
allow us to address the absolute accuracy of the measure- 
ments, which ultimately depends on the CKL fiux scale. 
The applicability of the Culgoora fluxes is more generally 
subject to some concern. The narrow bandwidth of Cul- 
goora and the lack of precise spectral index information 
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means that, integrated over ~100 MHz of bandwidth, 
a source with a spectral index a —1 will appear 5% 
brighter than a narrow spectrum measurement. Large 
scale structure invisible to the CCA could substantially 
boost the flux for resolved sources observed dense aper- 
ture arrays like the MWA or PAPER. As shown in Fig- 
ured] the MWA and PAPER 32 antenna arrays are much 
more compact and have little overlap with the long base- 
lines of the CCA. The images shows the narrow-band uv 
coverage; in fact, PAPER and MWA cover nearly 100 
times as much uv space in a multi-frequency synthesis 
image. 



PAPF.R MWA Culgoora 




Figure 1. The uv sampling of PSA32 and MWA32 are very simi- 
lar in scale and coverage density, with baselines between a few and 
1000 meters, but are very different from larger instruments like the 
CRH/CCA (which made the majority of the southern hemisphere 
flux measurements at 150 MHz), whose shortest baseline was 100 
m. For this reason we focus here on a comparison between PA- 
PER and MWA. The uv coverage is shown at a single 150 MHz 
channel which is representative for the Culgoora 1 MHz passband. 
Both PAPER and MWA images were made over ~100 MHz of 
bandwidth, and thus have ~100 times more uniform uv coverage 
in multi-frequency synthesis images. 

Despite the high level of similarity between the two 
data sets, there are still important differences which 
should be carefully noted. Probably most importantly, 
are the differences in image depth and area. The PSA32 
images incorporate data from many different pointings to 
smoothly map the sky; signal-to-noise is relatively con- 
stant across the image, but due to limited deconvolution 
the dynamic range is lower. The MWA images are more 
deeply deconvolved but limited in extent. The difference 
in SNR between the middle and edges is pronounced and 
comparable to the areas of the PSA32 map dominated 
by side-lobes. Figure [2] directly compares the images of 
the overlap region from both instruments. 

In addition, spectral slope across the wide ^80 MHz 
bandwidths used by PAPER and MWA could also be a 
source of intrinsic measurement difference. The images 
used to build both catalogs incorporated data across the 
band in a multi-frequency synthesis and thus are unable 
to directly measure spectral index. The bandwidths are 
different by 30% which, for sources with large spectral 
slopcQ will result in slightly different spectral averages. 
Furthermore, the MWA32 sub bands were weighted by 
the typical spectral index of a = —0.8, while the PAPER 
spectrum was not. This will cause most sources to be on 
average 5% brighter for MWA than for PAPER, addi- 
tional spectral variations between sources will introduce 
another ~1% variation around this number. 

3. FLUX SCALE COMPARISON BETWEEN INSTRUMENTS 

Most radio sources in this band have power law spectra S{u) = 
. The average spectral index for radio sources in this band 
is -0.8. 



Each catalog provides a list of sources, each with a 
flux and flux uncertainty. The PSA32 catalog lists peak 
flux and surrounding rms, while the MWA32 catalog lists 
fltted flux and fractional error, assumed to be constant 
at 30%. For the purposes of the following analysis, we 
assume these errors correctly describe the instrumental 
uncertainties. This question is explored further in Sec- 
tion [H 

In the region of overlap between the two surveys, there 
are 60 MWA entries within 30' of 41 PSA32 sources. Of 
these 41 PSA32 sources, 13 have multiple MWA com- 
ponents while the rest are 1:1 matches. In the case of 
multiple component matches, we pair sources with the 
highest flux. Images of the regions under comparison 
along with markers for the 41 overlapping sources are 
shown in Figure [21 

Two of these sources provide instructive examples. 
Figure [3] shows the PAPER and MWA images for two 
of the brightest sources which are listed in both catalogs 
and have multiple MWA components within 30'of a single 
PAPER source. The first, J0859-257, demonstrates the 
importance of both CLEANand cataloging method. The 
MWA32 catalog lists two sources in virtually the same 
location. (They are separated by 1.4' or 1/10 of a syn- 
thesized beam and were given the same truncated J2000 
name.). Meanwhile, the PAPER image which was not 
cleaned to this level has deep side-lobes and excess flux 
not visible in the MWA. Together these effects contribute 
to a 180% flux difference between the two (28 Jy for PA- 
PER, (43-1-6) Jy for MWA). The second source shown, 
J0745-191, is a classic example of resolution confusion, 
two sources whose point spread functions significantly 
overlap. Despite this, the two instruments agree on the 
brighter flux to 17%. 

Having obtained a list of corresponding sources, we 
wish to ask whether the two instruments produce mea- 
surements that are consistent with being on the same 
flux scale, given their reported errors. We thus compute 
the likelihood that the PSA32 fluxes Sp are related to 
the MWA fluxes Sm by a simple linear fit, with devia- 
tions from this relation due solely to random errors as 
described by both instru ments' error bars. The likeli- 
hood function is given bv lHogg et all ([2010l ) in their §7. 
We implement a Markov Chain Monte Carlo sampler to 
sample the posterior probability. At each step of the 
Markov chain we compute the error and distance of each 
point as projected orthogonally to the current direction 
of the line. These differences and errors are then used to 
form a Gaussian likelihood. The free parameters are the 
slope and the offset of the flux-flux line. 

The posterior probability distribution is shown in Fig- 
ure [5l The most likely flux relationship occurs at the 
peak of the posterior (shown in Figure [4]), and the con- 
fldence interval is deflned as the contours of the pos- 
terior sampling. Marginalizing over the flux offset, we 
flnd a distribution of flux scales which peaks at 1.05 and 
has a 73% confidence limit of 0.8 to 1.19, or 20% at 
Icr. The peak position is consistent with the offset due 
to the small spectral index correction in the construc- 
tion of the MWA32 wideband images and is consistent 
with MWA32 and PSA32 sharing the same flux scale. 
It should be emphasized that this is a more robust and 
correct determination of relative agreement between cat- 
alogs than either the flux-ratio histogram method imple- 
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Figure 2. A side-by-side comparison of PSA32 (left) and MWA32 (right) images under study here. The PAPER image is a mosaic of 
several snapshots that have been weakly cleaned. The bright side-lobes are due to residual Hydra A flux remaining after delay-delay rate 
filtering. The MWA mosaic is formed by averaging the two facets in Williams et al (2012) with a 10°-wide gaussian weight. The MWA 
images are composed of several drift scans and, while having a variable noise across the image do not have a simple corresponding set of 
primary beam weights. Sources found in both catalogs are black circles. Both images are centered on RA 9h45m-10d, 70 degrees wide by 
50 degrees tall, and a pixel size of 3'. The color scale is set so that 90% of the flux scale is black. 
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Figure 3. A side-by-side comparison of two previously known sources (L:0859-257, R:0745-191) extracted from the mosaics in Figure [2l 
For each source, PAPER is on the left and MWA on the right with MWA32 catalog sources marked with an X; PSA32 listed the position and 
amplitude of the peak within 30'of the image center. The left source provides examples of errors from both instruments. The MWA catalog 
lists two sources separated by 1.4', or 10% of a synthesized beam, which were even given the same truncated J2000 name. Meanwhile, the 
PAPER image, having not been CLEANed to this flux level, has larger side-lobes. Together these effects contribute to a 180% flux scale 
between the two (28 Jy for PAPER, (43-1-6) Jy for MWA). However, differences in deconvolution do not preclude an accurate comparison 
as shown on the right, where a source has multiple confused components yet the PAPER flux is within 17% of the MWA flux. 



m ented by I Jacobs et al.l ()2011|) or the average flux ratio 
of lWilliams et al.l (I2012D . 



4. SYSTEMATIC EFFECTS IN THE CATALOGS 

Both the PSA32 and MWA32 source catalog errors are 
almost certainly not dominated by thermal noise. To as- 
sess the origin of errors, it is necessary now to turn to pos- 
sible sources of systematic errors, and, for this purpose, it 
is desirable to have a reference to compare against. Since 
the MWA32 catalog is derived from sources found in two 
facets, it is possible to use intercomparison between the 
two facets as a diagnostic of systematic errors. While in 
principle a similar approach could be used for the PA- 
PER images, the individual PAPER snapshots were of a 
limited signal-to-noise, and thus intercomparison is not 
very meaningful. For this reason the individual facets 
were neither published nor included in this study. Thus 
for PAPER, we look for systematic errors by comparing 
against the CCA catalog. 

To simplify the analysis we will compare the 
peak fluxes, rather than the Gaussian fits used in 
IWilliams et al.l (|2012l) . This also simplifies the compari- 



son to the PSA32 catalog (Section [3]), which also used 
peak fluxes. To test the actual amount of flux error 
when using the two methods we compare the MWA32 
peak fluxes with the fit fluxes listed in the catalog. The 
amount of disagreement ranges from a median of < 1% 
in the Hydra A field to 8% in the E0R2 field. As we wiU 
see, this error is much smaller than other effects we will 
identify. 

Occasionally, several MWA sources were closer to- 
gether than 30' causing the peak finder to sometimes 
find duplicate fiux measurements. After eliminating ^10 
sources within 30' of each other, we compute the median 
and rms facet to facet fractional difference. 

In this large sample of 539 sources, the distribution of 
the fractional errors is peaked around 16% but extends 
beyond 100%, a state reflected in its RMS of 39.9% and 
median value of 13%. The distribution of the errors is 
shown in Figure [SI The best fit histogram has a width of 
37% though a width of 20% seems to better rcfiect the 
center of the distribution, which, as we will see below 
suggests that the errors are non-gaussian and most likely 



6 



Jacobs et al. 




Spaper [ly] 

Figure 4. Fitting a linear relationship between MWA and PSA32 
in the presence of error bars. The PSA32 errors are image plane 
RMS in an annulus around the source, while the MWA error bars 
are fractional between 30 and 80%, depending on distance from 
image center. The line represents the peak of the posterior and the 
blue region indicates la confidence. 

systematic. 

Though many sources are visible in both facets only the 
subset found in the primary field of vie\\E!l (26°FWHM 
@ 190MHz) have comparable instrumental error. Indeed, 
the median uncertainty of these 63 sources very similar to 
the larger sample at 16%, but the RMS is much smaller 
at only 29%. 



0.6 0.8 1.0 1.2 1.4 1.6 




fluxscale SMWA/SPAPER [m] 

Figure 5. The posterior probability distribution of the PA- 
PER/MWA flux calibration. The output of the flux relationship 
fit is a series of samples of the model parameters, slope (m) and 
intercept (b). The occupation number of each m,b value is a sam- 
ple of the posterior probability and the projection down to cither 
variable gives the marginalized distribution. The histograms on 
the sides give the marginalized likelihood. The marginalized flux- 
scale or slope is ana logous to the distribution of flux scales used in 
IJacobs et al.i 11201 II V The peak probability occurs at fiux scale of 
unity and intercept of -0.5, the contour shown encloses the solutions 
having 76% probability. The slope of the probability distribution is 
steep; 95% probability density contours were not significantly dif- 
ferent enough to be over plotted. The marginalized slope posterior, 
labeled as "fiux scale" to which it is roughly analogous, reaches the 
76% level at 0.8 and 1.2 indicating that the flux scale is correct to 
within 20%. 



4.1. Errors Due to Primary Beam 

Meanwhile, the opposite is true of flux difference versus 
Right Ascension, as is shown in Figure[7l Sources above 1 
Jy show a clear linear trend in flux difference with Right 
Ascension, changing by as much as 200% over 25 degrees 
of longitude. No trend is observed in the Declination 
direction. 

A second systematic affect was apparent in the faint, 
unCLEANed sources < 5 Jy of the MWA32 facets: a dis- 
tinct systematic, monotonic trend in facet disagreement 
(Figure [7]). The RA dependence of the disagreement 
is consistent with the expected difference between two 
facets with similar scale beam errors as illustrated in Fig- 
ure [9l The true flux of each facet image is estimated by 
dividing the perceived flux by a model of the primary 
beam. The models used for both MWA and PAPER arc 
based on simulations. When the model does not match 
reality the flux scale will incorrectly be seen to increase 
or decrease uniformly towards the beam edges. When 
two pointings are differenced, the errors on the opposing 
edges will have opposite signs. The scale of the error, 
~ 40% at field of view edges, is consistent with the tests 

The actual effective beam will be complicated by the inclu- 
sion of several pointings and bands, all of which have measurably 
different patterns. This sample, which includes only the published 
maps and the known primary beam size probably best describes 
the uncertainty in the MWA32 catalog. 
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Figure 6. The two overlapping MWA facets provide an opportu- 
nity to examine the sources of errors. Here we examine the distri- 
bution of fractional difference in the facet flux of MWA32 catalog 
sources. The distribution is non-gaussian which causes the gaus- 
sian flt (dotted) to clearly over-estimate the amount of error at 
37% compared with the 20% error model found by comparing with 
PAPER (dashed). In these images most of the sources with 50%-f 
error appear to be the result of an imperfect primary beam model 
correction (c.f. Fig. [71 showing this error vs RA and Fig [9] giving 
an explanation for the shape of that relation) . 



of MWA antenna tiles in an anechoic chamber at Lincoln 
Labs, where the MWA tile responses were found to dif- 
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Figure 7. Fractional difference between peak fluxes in the two 
overlapping MWA32 facets as a function of Right Ascension. No 
trend is observed in the Declination direction. The fractional dif- 
ference near the middle, where the facet overlap is best, averages 
around 20% and rises to over 100% at the periphery. The shape 
is similar to what one would expect from use of an inaccurate pri- 
mary beam model, a problem endemic to both PAPER and MWA. 
See Figure [9] for a cartoon explanation. 




Figure 9. A cartoon of a 90° azimuth (East-West) cut through 
two adjacent primary beams of any wide field telescope to compare 
with the systemic difference shown in Figure [T] When the model 
of the primary beam (solid black) is applied in place of the true 
model (dashed) the error is manifested as a characteristic flux-scale 
that varies with position (dashed lines). When two pointings are 
differenced, the errors on the opposing edges will have opposite 
signs. The affect will be most pronounced when comparing sources 
occurring at the extremes of both beams along the axis bisecting 
both facets (here Right Ascension). 
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Figure 8. Spatial distribution of PSA32/Molonglo Reference Cat- 
alog flux ratio and local PSA32 rms in Galactic (top) and Equa- 
torial (bottom) coordinates (from Jacobs (2011)). Point size indi- 
cates flux scale as shown in inset key, local image rms is related 
by color. The area of high flux-scale appears to be correlated with 
high rms in upper latitudes, particularly near bright sources far 
from pointing center. Though the error is not strictly linear with 
distance from the suspected source of side-lobes, inspection of the 
image suggests that insufficient deconvolution of Ifcrcules A and 
the Crab is to blame. 



4.2. Errors Due to CLEAN Algorithm 

As we saw when comparing images in Fig [3l different 
levels of deconvolution affect the degree to which the PA- 
PER and MWA images agree. This affect is also notice- 
able when comparing the two MWA facets which were 
cleaned independently. 
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fer from the mo del by 1.34 dB (36%) at 15° from zenith 
(IWilliamsll2012l) . 

MWA primary beams will be holographically mapped 
and calibrated during commissioning of the final array 
configuration. This data was not yet available for the 
MWA32 catalog. Because the MWA images from this 
study used several tile aperture pointings and the more 
limited theoretical holographic model, the uncertainty 
in the beam model was large. Other experiments by 
iBernardi et al.l (|2012( ) observing in drift scan mode using 
only a single, well characterized, pointing, were able to 
find closer fiux/catalog agreement. Methods that utilize 
the holographic beam in the deconvolution process are 
now bein g tested that will signific a ntly reduce th i s sys- 
tematic (iMorales fc Mateiekl 120091: iSuUivan et all 120121: 
iTasse et al.l 1201211 



Figure 10. Fractional difference between peak fluxes in two over- 
lapping MWA32 facets versus distance from Hydra A, which is 7 
times brighter than the next brightest source in either image. The 
error in bright sources (> IJy, black squares) generally tracks that 
of the full set of sources (gray dots), but the bright sources nearest 
Hydra A show a depression of their flux consistent with sitting in 
a negative sidelobe of Hydra A. The black line is not a formal fit 
but shows the systematic suppression of bright sources < 8 degrees 
from Hydra A. 



In this limited selection, systematic differences errors 
are less obvious. One that is most suggestive is a pos- 
sible linear increase in error with proximity to Hydra A 
shown in Figure 10. Hydra A is 7 times brighter than the 
next brightest source. During the first CLEAN iterations 
the model will only contain Hydra A. When clean be- 
gins to model flux at the level corresponding to the next 
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brightest sources it must decide how to divide up fluxes of 
nearby sources whose side-lobes significantly overlap. If 
it divides incorrectly (putting the flux of one source into 
another) CLEAN enters a false minimum from which it 
cannot escape. The result will be that models of sources 
near very bright sources will be more corrupted. The 
images were CLEANed to 1% of the peak flux, or about 4 
Jy for EoR2 and 6 Jy for the Hydra A field . The fact 
that the error does not affect sources below 5 Jy suggests 
that the error is related to a CLEAN converging on a false 
minimum. 

Deconvolution and primary beam errors illustrated 
by the MWA32 data are present in varying degrees in 
the PSA32 data as well. The PAPER images are not 
CLEANed as deeply as the MWA32 images. However, 
as we see in Figure [5] which shows the ratio of PAPER 
to catalog values, when compared to other catalogs, the 
largest errors were found to cluster near bright sources 
beyond the imaged regio n at low elevation in the pri- 
mary beam (|Jacobsll20H] ). The errors did not increase 
with distance and appear to be due to side-lobes from 
the sources indicated in the figure. Recent analysis of 
measured source tracks has found the PAPER beam to 
be accurate to between 10% and 15%, though sources 
can have individua l errors of 20% or occasionally more 
(|Pober et al l lMl) . 

5. CONCLUSIONS AND RECOMMENDATIONS 
We summarize our conclusions as follows: 

1. The PSA-32 and MWA-32T catalogs are on the 
same flux scale, consistent with their stated errors 
(flux agreement of 20% at a probability of 0.76). 

2. Both PSA-32 and MWA-32T catalogs show evi- 
dence for systematic errors in the fluxes of sources 
near bright sources, the likely explanation for which 
is errors in CLEANing the bright sources. 

3. The MWA-32T catalog shows evidence for a sys- 
tematic flux error of sources as a function of RA 
likely due to an error in the primary beam model 
combined with the mosaicking of facets along the 
RA direction. Due to its construction from a num- 
ber of overlapping facets along RA, the PSA-32 
catalog does not show a similar artifact. 

We summarize the sources of error in the three sets of 
flux measurements (two MWA32 facets, 1 PAPER mo- 
saic) into several categories, outlined in Table [3l Types 
of errors as deduced from the MWA facet analysis are 
given in the upper part of the table, whereas intercom- 
parisons between the two catalogs are given below the 
dividing line. 

All EoR telescopes must demonstrate the ability to 
make reliable and repeatable measurements. Employing 
the lessons learned in this early stage we can summarize 
the implications of Table [3] for improvements necessary 
for EoR experiments are as follows: 

1. Flux scale is currently not accurate to better than 
20%. This implies a ~ 40% uncertainty in the 
power spectrum. This is most likely due to to pri- 
mary beam uncertainties in transferring fluxes be- 
tween calibrators, and also due to CLEAN uncer- 
tainties. 



Table 3 

Error Budget 



Source of Error Fractional Error Refer to 

Flux measurement (peak vs fit) 4.5% ||3] 

Primary beam 25% Figure [7] 

Edge of beam 100% Figure [7] 

CLEAN of bright sources 50% Figure [TOl 
Theoretical bandwidth mismatch 5% [[2] 

Actual difference between telescopes 20% Figure [5] 



2. The precision of the sky model is sufficient to accu- 
rately subtract ^ 80% of bright foreground sources, 
which is a significant distance from the 0.25% re- 
quirement to be able to subtract sources and work 
within the EoR "wedge". Future work should be 
able to improve on this dramatically, though it is 
not obvious that this two order of magnitude re- 
quirement can be reached. 

3. The CLEAN algorithm introduces correlated errors 
between sources. Catalogs should include informa- 
tion about the degree of correlation. This informa- 
tion would then inform the comparison likelihood 
model. 

4. Work towards improving primary beam accuracy 
is of utmost importance for both experiments and 
for EoR measureme nts generally, as for polarization 
(jMoore i mage reconstruction a nd fully 
holographic imaging () Sullivan et al.l I2012D and is 
also currently the limiting factor in the accuracy of 
the catalogs. 

To address implication[l] we recommend establishment 
of system of reference sources with detailed and repeated 
measurements by both instruments. We should note that 
the only reason the flux relationship fit converges on a 
single stationary gaussian-like probability distribution is 
the 30% fractional error bar hsted in the MWA32. This 
large fractional error was designed to match the approx- 
imate scale of deviation from Culgoora values and ap- 
pears consistent with the facet comparison analysis above 
(Figure 0- The significance of this comparison is in the 
successful application of a new method for comparing 
catalogs. The MCMC likelihood algorithm allows the 
addition of more detailed error models that take into 
position and flux dependent errors like those described 
above. For the reasons noted in Section [31 inter-catalog 
comparisons should take into account both the quoted er- 
rors, and quote the resulting range of model parameters 
which could relate the two. It should be noted that the 
probabihstic method used to relate PSA32 to MWA32 
could be extended to take into account a more detailed, 
non-gaussian, error model, and in principle, can also be 
used to assess the correctness of the individual object 
error bars from either catalog, with the addition of a 
likelihood for the errors. Extra catalog meta data, such 
as the correlation between measurements, as suggested in 
number [31 could also be folded into the likelihood model. 
This is a subject for future work. 

Regarding point [31 CLEAN incorporates little prior 
knowledge into its result. This is a good choice for nar- 
row field of view instruments observing an unknown sky. 
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But wide field of view deconvolution always encompasses 
many oft-measured sources. Future deconvolution efforts 
should incorporate known fluxes as prior data. One ex- 
ample of a method which could incorporates priors in 
this w ay is the Fast Holo Rraphic Deconvolution algo- 
rithm (jSullivan et al.l[2"012[ ) which provides a faster for- 
ward model suitable for building a likelihood-based ap- 
proach. 

Of all the observed systematics, the beam model error 
is the largest, making it clear that more effort must be 
devoted to measuring the primary beam. We note that in 
the case of the MWA, the beam error was discernible be- 
cause two deep, independently imaged facets happened 
to overlap each other, allowing comparison of many dim 
sources. This suggests 1) that the images used to gen- 
erate a catalog should be published along with the list 
of source fluxes (both PAPER and MWA images avail- 
able only "on request") and 2) that surveys should be 
arranged so that each source measurement is repeated at 
differing hour angles, observing it at different points in 
the primary antenna beam. 
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